Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(stats): Chart migration 7 #1162

Merged
merged 13 commits into from
Jan 7, 2025

Conversation

bragov4ik
Copy link
Contributor

@bragov4ik bragov4ik commented Dec 24, 2024

schedule was taken from here.

I encourage to review by commits (took feedback from the previous time)

Summary by CodeRabbit

  • New Features

    • Added new metrics for tracking contracts: new_contracts_24h and new_verified_contracts_24h.
    • Introduced a new group VerifiedContractsPageGroup for better update management.
    • Added a new tool for visualizing available timeslots in the update schedule.
  • Bug Fixes

    • Improved handling of transaction data for better accuracy in contract creation indexing.
  • Documentation

    • Updated and added documentation for the find_free_timeslot.py tool.
  • Chores

    • Added new dependencies to enhance functionality in various modules.

Copy link

coderabbitai bot commented Dec 24, 2024

Walkthrough

This pull request introduces comprehensive enhancements to the stats configuration and tracking system, focusing on adding new metrics for contracts and verified contracts. The changes span multiple files across the project, introducing two primary new counters: new_contracts_24h and new_verified_contracts_24h. These additions expand the system's capability to track contract-related metrics over a 24-hour period, with modifications to configuration files, database interaction modules, and update group scheduling. The implementation includes SQL statement generation, remote database source handling, and integration with existing charting infrastructure.

Possibly related PRs

Poem

🐰 Hop, hop, hurray! New metrics are here to play!
Contracts dancing, 24 hours bright,
Verified and fresh, a statistical delight!
CodeRabbit's charts now sparkle with glee,
Tracking blockchain's vibrant history! 🚀

Tip

CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command @coderabbitai generate docstrings to have CodeRabbit automatically generate docstrings for your pull request. We would love to hear your feedback on Discord.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@bragov4ik bragov4ik force-pushed the bragov4ik/stats-migrate-backend-charts-7 branch 2 times, most recently from 4547cad to 8131878 Compare December 24, 2024 14:53
Base automatically changed from bragov4ik/stats-migrate-backend-charts-6 to main December 25, 2024 04:47
@bragov4ik bragov4ik force-pushed the bragov4ik/stats-migrate-backend-charts-7 branch from 8131878 to 85cd2c9 Compare December 25, 2024 04:50
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (11)
stats/stats/src/data_source/kinds/remote_db/query/one.rs (2)

58-63: Add a doc comment for clarity
A short explanatory doc comment above StatementFromUpdateTime would help future maintainers or implementers understand its purpose.

+/// Trait that provides an SQL statement based on `update_time`
+/// and completed migrations. Intended for single-value queries
+/// whose timespan is derived from the current update time.
 pub trait StatementFromUpdateTime {
    fn get_statement(
        update_time: DateTime<Utc>,
        completed_migrations: &BlockscoutMigrations,
    ) -> Statement;
}

65-71: Consider unifying with PullOne to reduce duplication
PullOneValue and PullOne share a similar pattern and could likely be combined or parameterized further to reduce code duplication. This is not critical but worth considering for improved maintainability.

stats/config/utils/find_free_timeslot.py (2)

11-13: Promote enumerations over string constants
Defining DURATION_MENU_MANUAL and DURATION_MENU_CONFIG as string constants works, but consider using an Enum to improve readability and type-safety across the codebase.

+from enum import Enum
+
+class DurationSource(Enum):
+    MANUAL = "manual"
+    CONFIG = "config"
+
 # Replace references to DURATION_MENU_MANUAL / DURATION_MENU_CONFIG with:
- DURATION_MENU_MANUAL = "manual"
- DURATION_MENU_CONFIG = "config"
+ DURATION_MENU_MANUAL = DurationSource.MANUAL.value
+ DURATION_MENU_CONFIG = DurationSource.CONFIG.value

219-221: Fallback to default duration is a good practice
Falling back to self.default_duration if the manual input is invalid prevents unexpected crashes. It might help to also notify users that the default duration was applied, so they realize their custom input was ignored.

stats/stats/src/charts/counters/total_verified_contracts.rs (1)

21-22: Consider providing a doc comment for this struct
Documenting the purpose of TotalVerifiedContractsStatement will help maintainers quickly understand its role in forming SQL statements.

stats/stats/src/charts/counters/new_verified_contracts_24h.rs (2)

22-36: Consolidate filter logic into a single function if reusability is needed.
Overall, the usage of interval_24h_filter is correct for limiting rows to the last 24 hours. If there are multiple charts requiring similar logic, consider abstracting the filtration to keep the code DRY.


60-62: Documentation clarifies scope.
Marking that internal transactions are not included adds clarity to the usage context.

stats/config/utils/durations/durations_producer.py (3)

9-12: Consider clarifying regex patterns in docstring

The regex usage in convert_camel_to_snake is clear enough, but adding a short comment clarifying the capturing groups and expectations could be helpful for future maintainers.


15-19: Potential integration with custom period names

Currently, normalize_period handles a fixed map for certain periods. If there's a future need for custom period names or additional time units, consider centralizing the mapping or retrieving it from an external config.


21-56: Robust parsing for singleton and complex groups

The parse_rust_groups function does a good job extracting both singleton and complex groups. As a potential enhancement, you could guard against potential edge cases with more thorough error handling, for example if the file is empty or the regex fails to parse.

stats/stats/src/tests/mock_blockscout.rs (1)

144-145: Clarify the commented-out date

A short note on why "2022-11-17T00:00:00" is intentionally skipped can preclude confusion for future maintainers.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7544481 and 85cd2c9.

📒 Files selected for processing (25)
  • stats/config/charts.json (2 hunks)
  • stats/config/layout.json (1 hunks)
  • stats/config/update_groups.json (1 hunks)
  • stats/config/utils/durations/.gitignore (1 hunks)
  • stats/config/utils/durations/README.md (1 hunks)
  • stats/config/utils/durations/data.csv.example (1 hunks)
  • stats/config/utils/durations/durations.json (1 hunks)
  • stats/config/utils/durations/durations_producer.py (1 hunks)
  • stats/config/utils/find_free_timeslot.py (3 hunks)
  • stats/config/utils/requirements.txt (1 hunks)
  • stats/stats-server/src/runtime_setup.rs (1 hunks)
  • stats/stats-server/tests/it/counters.rs (1 hunks)
  • stats/stats/src/charts/counters/mod.rs (2 hunks)
  • stats/stats/src/charts/counters/new_contracts_24h.rs (1 hunks)
  • stats/stats/src/charts/counters/new_verified_contracts_24h.rs (1 hunks)
  • stats/stats/src/charts/counters/total_contracts.rs (2 hunks)
  • stats/stats/src/charts/counters/total_verified_contracts.rs (2 hunks)
  • stats/stats/src/charts/db_interaction/mod.rs (1 hunks)
  • stats/stats/src/charts/db_interaction/utils.rs (1 hunks)
  • stats/stats/src/data_source/kinds/remote_db/mod.rs (1 hunks)
  • stats/stats/src/data_source/kinds/remote_db/query/mod.rs (1 hunks)
  • stats/stats/src/data_source/kinds/remote_db/query/one.rs (2 hunks)
  • stats/stats/src/tests/mock_blockscout.rs (3 hunks)
  • stats/stats/src/update_group.rs (2 hunks)
  • stats/stats/src/update_groups.rs (1 hunks)
✅ Files skipped from review due to trivial changes (5)
  • stats/stats/src/charts/db_interaction/mod.rs
  • stats/config/utils/durations/.gitignore
  • stats/config/utils/durations/README.md
  • stats/config/utils/durations/data.csv.example
  • stats/config/utils/durations/durations.json
🔇 Additional comments (57)
stats/stats/src/update_group.rs (4)

536-536: Confirm consistent naming usage between ContractsGrowth and NewContracts.

You have introduced both ContractsGrowth and NewContracts. If there's a dependency relationship between them, ensure that the names reflect their roles consistently (e.g., “Contracts Growth” might internally use “New Contracts” values). Otherwise, this import might lead to confusion or mismatch if developers expect a consistent naming scheme.


543-543: Clarify whether NewContracts should be part of this group.

The group definition includes only ContractsGrowth, but the tests suggest that NewContracts might be required as well. If this omission is intentional, to trigger an error for demonstration or testing, it's fine. Otherwise, consider adding NewContracts to this group if that chart belongs here logically.


549-549: Re-check dependency coverage.

You have only the mutex for ContractsGrowth here. Given the test below, there may be a functional dependency on NewContracts. Confirm whether you need to add NewContracts’s mutex to cover all actual dependencies.


568-568: Validate that it correctly flags the missing mutex.

This test ensures the error reports NewContracts as missing. This is good for confirming that required charts are present in the group’s mutexes. If ContractsGrowth no longer depends on NewContracts, consider removing this from the missing list to avoid confusion.

stats/stats/src/data_source/kinds/remote_db/query/one.rs (3)

4-4: Imports necessary for new functionality
Bringing TryGetable into scope is appropriate and essential for the upcoming usage in PullOneValue. Nothing to fix here.


9-9: New imports for data source types and timespan
Imports for BlockscoutMigrations, UpdateContext, WrappedValue, Timespan and TimespanValue are consistent with the new trait and struct usage. Looks good.

Also applies to: 12-12


72-94: Verify dropping the time portion is appropriate
When constructing the timespan, note that date_naive() strips away the time zone and hour/minute/second information. Ensure this aligns with the intended behavior for your 24-hour window or daily metrics.

✅ Verification successful

Based on the search results, I can provide the final response as the usage of date_naive() is consistent with the intended behavior across the codebase. Here's my analysis:

Dropping time portion is correct and consistent with timespan implementations

The usage of date_naive() to strip time information when constructing timespans is the correct approach because:

  1. All timespan implementations (Day, Month, Week, Year) consistently use only the date portion when creating timespans via from_date().
  2. The test cases across the codebase confirm this behavior, particularly in handling daily/monthly/yearly boundaries.
  3. The implementation aligns with the Timespan trait's documentation which indicates that timespans are date-based rather than time-based: I.e. for some date d, Timespan::from_date(d).into_date() != d.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Search usage of `from_date(` across the codebase
# to confirm if ignoring the time portion is consistent elsewhere.

rg -A 5 "from_date\\("

Length of output: 41411

stats/config/utils/find_free_timeslot.py (5)

28-32: BooleanVar toggling is clear and intuitive
Good approach using tk.BooleanVar to toggle between config-based or manual durations. This cleanly signals the source of truth for task durations.


33-41: Validate loaded JSON content
While this block elegantly handles missing or invalid files, consider adding checks to validate the content structure of the durations JSON. This ensures that any malformed file is caught early before being used downstream.


68-74: Straightforward flow for loading JSON
The logic for loading and updating visualization & schedule list is consistent. Be sure that any missing fields in the loaded JSON are gracefully handled to prevent runtime errors.


228-232: Config-based vs manual durations logic looks consistent
Well-handled logic for determining which duration source to use. Make sure to confirm that each schedule name in schedules has a corresponding duration entry in task_durations if the config-based option is chosen.


322-330: Robust approach to dynamic tags parsing
Safely searching for "tasks_" prevents out-of-bounds errors. Switching to "/" as the separator is a concise fix. This logic is more reliable for future expansions.

stats/stats/src/charts/counters/total_verified_contracts.rs (6)

2-8: Imports look good
All newly introduced imports are cohesive and align well with the existing architecture.


13-13: Smart contracts entity usage
Using smart_contracts from blockscout_db::entity is consistent with the data flow for verified contracts.


16-19: Additional sea-orm imports
The sea_orm imports (e.g., sea_query and QueryFilter) are relevant and properly scoped.


23-34: Verify Index on InsertedAt for performance
The query filters on InsertedAt; ensure there is an appropriate database index to optimize counting for large tables. If not, performance may degrade over time.


36-37: Remote database source type looks good
Defining TotalVerifiedContractsRemote succinctly delegates query logic to the new statement.


59-59: Check necessity of MapToString
The final alias leverages MapToString<TotalVerifiedContractsRemote>. Confirm that string conversion is needed for consumer layers.

stats/stats/src/data_source/kinds/remote_db/query/mod.rs (1)

7-7: Addition of PullOneValue and StatementFromUpdateTime exports
These newly exported items will help unify remote query logic. The approach appears consistent with the existing structure.

stats/stats/src/charts/db_interaction/utils.rs (2)

1-3: Imports are appropriate
The chrono and sea_orm imports lay the groundwork for the 24h interval filtering functionality.


4-16: Confirm edge cases for the 24h interval filter
Consider how intervals beyond 24 hours or negative time differences are handled (e.g., future timestamps). If not needed, this is fine. Otherwise, clarify the behavior.

stats/stats/src/charts/counters/mod.rs (2)

5-6: New modules for 24h counters
Introducing new_contracts_24h and new_verified_contracts_24h modules is logically consistent with the new metrics requirement.


28-29: Public exports for 24h counters
Exporting NewContracts24h and NewVerifiedContracts24h ensures they are accessible for chart generation.

stats/stats/src/charts/counters/total_contracts.rs (1)

Line range hint 17-31: Selective query usage and performance note

  • Adding .select_only() helps optimize the query to return only required columns.
  • The performance comment is valid; ensure InsertedAt is indexed to keep the counting query efficient as data grows.
stats/stats/src/data_source/kinds/remote_db/mod.rs (1)

37-38: Exports of new query types look good.
These new exports (PullOneValue and StatementFromUpdateTime) expand the module’s capabilities without altering the existing RemoteDatabaseSource logic.

stats/stats/src/charts/counters/new_verified_contracts_24h.rs (9)

1-12: Imports are consistent and concise.
They logically group the traits and types needed for building the chart source and referencing DB entities.


14-19: Entity imports for smart_contracts
It's correct that you're using the SeaORM entity to build queries for the smart_contracts table.


20-21: Clear statement representation.
NewVerifiedContracts24hStatement is properly scoped and named to reflect its purpose of counting new verified contracts over 24 hours.


38-39: Remote source type alias is helpful.
Defining NewVerifiedContracts24hRemote clarifies the data flow from the remote DB source for single-value retrieval.


41-41: Properties struct is a minimal chart descriptor.
Good approach — a single source of truth for naming and metadata.


43-47: Naming convention consistent with other charts.
"newVerifiedContracts24h" aligns with naming patterns used across the codebase.


49-58: Chart properties align with the standard approach.
Using ChartType::Counter and MissingDatePolicy::FillPrevious is consistent with typical “counter” charts.


63-64: Local DB chart type alias
NewVerifiedContracts24h references the DirectPointLocalDbChartSource, consistent with other charts.


66-78: Integration test is well-structured.
Although ignored for now, it provides a direct verification approach once a database is available.

stats/stats/src/charts/counters/new_contracts_24h.rs (11)

1-12: Imports are coherent.
They match the structure for integrating chart code with SeaORM utilities.


14-19: Entity-based logic fosters maintainability.
Referencing the transactions entity ensures good modularity when building statements.


20-21: Struct name matches its function.
NewContracts24hStatement is intuitive and descriptive.


22-36: Query filter usage is consistent with the contract creation approach.
Filtering on Status.eq(1) and time-based logic is presumably correct for this domain.

Do confirm that Status=1 is indeed the correct numeric code for contract-creation success. If needed, we can run a verification script searching for references to status codes. Would you like that?


38-39: Single-value remote source type alias aligns with the preceding pattern.
This is consistent with the newly introduced PullOneValue usage.


42-42: Properties struct declared.
No issues; it's aligned with the naming pattern.


44-47: Aligned ID for the chart.
"newContracts24h" maintains consistency.


49-59: Resolution, chart type, and missing date policy are standard.
No concerns found.


61-63: Helpful clarifications.
The documentation states how internal transactions are excluded for performance.


63-64: Local DB chart source type alias
Consistent approach, no issues.


66-81: Test design matches the pattern from other counters.
Including an ignored integration test is helpful for future real-world validation.

stats/stats-server/tests/it/counters.rs (1)

52-53: Updates to expected counter IDs.
Adding "newContracts24h" and "newVerifiedContracts24h" ensures integration test coverage for new metrics.

stats/stats/src/update_groups.rs (1)

227-240: Group Composition Looks Good

Great job bundling these charts into a single group (VerifiedContractsPageGroup) for practical scheduling. This ensures closely-related metrics update together without overloading the database.

stats/config/utils/durations/durations_producer.py (2)

58-151: Ensure accurate unit conversion when reading durations

Within process_durations, you assume the CSV stores time in seconds and convert by dividing by 60. If there's any mismatch in how data is produced, it could lead to inaccurate partitioning. It may be worth adding a small assertion or documentation note about the CSV format.


153-203: CLI design is user-friendly

Using Typer for command-line arguments and structured error handling provides a great developer experience. The code is readable, well-organized, and easy to extend.

stats/stats-server/src/runtime_setup.rs (1)

299-299: New group integrated seamlessly

Adding Arc::new(VerifiedContractsPageGroup) extends the runtime setup without introducing complexity. This aligns well with the existing flow in all_update_groups().

stats/stats/src/tests/mock_blockscout.rs (2)

471-480: Good approach to tracking contract indexing time

Adding created_contract_code_indexed_at for contract creation transactions is a solid step toward better metric accuracy. Be sure to validate that the 10-minute offset is correct for all contract-based indexing scenarios.


509-509: Field addition properly references the new logic

The created_contract_code_indexed_at field in transactions::ActiveModel correctly references the newly introduced logic. It cleanly integrates into the existing model and ensures indexing time can be tracked for each transaction.

stats/config/utils/requirements.txt (1)

3-3: Verify pandas usage and version compatibility

The addition of pandas aligns with the chart migration goals. However, let's verify its usage in the codebase.

✅ Verification successful

Pandas dependency is correctly used and version is appropriate

The pandas library is used in durations_producer.py for reading CSV files using pd.read_csv(). The specified version requirement pandas>=2.2.3 is appropriate as:

  • It's being used for basic CSV reading functionality which is stable across versions
  • The version is recent and includes security fixes
  • The usage is minimal and doesn't rely on newer pandas features
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for pandas imports and usage
rg -l 'import pandas|from pandas' stats/config/utils/

Length of output: 106


Script:

#!/bin/bash
# Check pandas usage in durations_producer.py
rg -A 5 'import pandas|from pandas' stats/config/utils/durations/durations_producer.py

Length of output: 213


Script:

#!/bin/bash
# Check how pandas is used in the file
ast-grep --pattern 'pd.$_' stats/config/utils/durations/durations_producer.py

Length of output: 173

stats/config/layout.json (1)

7-8: Verify counter definitions in charts.json

The placement of new counters is logical. Let's verify their definitions in charts.json.

✅ Verification successful

Counter definitions are properly configured in charts.json

Both counters are well-defined in charts.json with appropriate titles and descriptions:

  • new_contracts_24h: "Contracts indexed within last 24h"
  • new_verified_contracts_24h: "Number of verified contracts within last 24h"

The definitions are consistent with their intended purpose and follow the same pattern as other similar counters in the configuration.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for counter definitions in charts.json
rg -A 10 '"new_contracts_24h"|"new_verified_contracts_24h"' stats/config/charts.json

Length of output: 958

stats/config/update_groups.json (2)

36-36: Verify impact of schedule change for new_accounts_group

The schedule for new_accounts_group has been moved one hour earlier. Let's verify there are no timing dependencies affected by this change.

✅ Verification successful

Let me gather more information about the dependencies and timing relationships.


Schedule change for new_accounts_group is safe to proceed

Based on the analysis of the codebase:

  • The new_accounts_group is responsible for updating three related metrics: new_accounts, accounts_growth, and total_accounts
  • The duration setting in durations.json shows this group takes 102 seconds to complete
  • Moving the schedule from 5:00 to 4:00 doesn't conflict with any dependent jobs as:
    • No other groups directly depend on this group's execution
    • The 102-second duration is well within the one-hour time slot
    • The metrics are daily aggregates, so the exact hour doesn't impact data accuracy
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for references to new_accounts_group in the codebase
rg "new_accounts_group" stats/

Length of output: 564


Script:

#!/bin/bash
# Check the migration file to understand the context of these groups
cat stats/config/migration/m20240618_080000_update_groups_and_separate_layout/split.py

# Check durations file to understand timing constraints
cat stats/config/utils/durations/durations.json

Length of output: 11522


43-43: Verify verified_contracts_page_group implementation

The new group runs every 30 minutes. Let's verify its implementation and dependencies.

✅ Verification successful

Verified contracts page group implementation is properly configured

The implementation appears to be correct and well-integrated:

  • The group is defined as VerifiedContractsPageGroup in update_groups.rs and includes relevant charts for tracking verified contracts
  • The timing "0 15,45 * * * * *" (every 30 minutes) is appropriate for the charts it updates:
    • Total contracts and verified contracts counts
    • 24h new contracts and verified contracts metrics
  • The implementation is supported by dedicated chart modules in the codebase
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for group definition and usage
ast-grep --pattern 'struct verified_contracts_page_group {
  $$$
}' stats/stats/src/update_groups.rs

# Check for any timing dependencies with other groups
rg "verified_contracts_page_group" stats/

Length of output: 237


Script:

#!/bin/bash
# Let's check for the implementation details and dependencies
rg -A 10 "verified_contracts_page_group" stats/stats/src/update_groups.rs

# Check if there are any other files that might be using this group
fd . stats/ -t f -e rs -x rg -l "verified_contracts_page_group" {}

Length of output: 142


Script:

#!/bin/bash
# Let's try a broader search to understand the context
rg -l "struct.*Page" stats/stats/src/update_groups.rs

# Also check for any page-related implementations
rg -A 5 "Page" stats/stats/src/update_groups.rs

# Look for any verified contracts related code
rg -l "verified.*contract" stats/

Length of output: 1009

stats/config/charts.json (1)

Line range hint 72-87: Verify complete integration of new counters.

The new counters appear to be part of a larger chart migration effort. Let's verify their complete integration across the system.

✅ Verification successful

Integration of new counters is properly implemented across the system

The verification shows complete integration of the new counters:

  • Both counters are properly ordered in layout.json
  • Counter implementations exist in dedicated Rust modules:
    • stats/stats/src/charts/counters/new_contracts_24h.rs
    • stats/stats/src/charts/counters/new_verified_contracts_24h.rs
  • Test coverage is present in stats/stats-server/tests/it/counters.rs
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify integration of new counters across the system

# Check if counters are properly ordered in layout.json
echo "Checking layout.json for counter ordering..."
rg -A 10 '"counters_order"' stats/config/layout.json

# Check if counter implementations exist in Rust
echo "Checking for counter implementations..."
rg -l "new_contracts_24h|new_verified_contracts_24h" stats/stats/src/charts/counters/

# Check if counters are covered in tests
echo "Checking test coverage..."
rg -A 5 "newContracts24h|newVerifiedContracts24h" stats/stats-server/tests/

Length of output: 1422

stats/config/charts.json Show resolved Hide resolved
stats/config/charts.json Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (15)
stats/config/utils/free_timeslots/README.md (4)

3-6: Installation instructions could be more comprehensive.

While the basic installation steps are provided, consider enhancing them for better user experience.

Add these improvements:

 ## Preparations

 1. Install tkinter (e.g. `apt-get install python3-tk` or `brew install python-tk`) for `find_free_timeslot.py` GUI
-2. Install other dependencies from `requirements.txt`: `pip install -r requirements.txt`
+2. Install other dependencies from `requirements.txt`:
+   ```bash
+   pip install -r requirements.txt
+   ```
+3. Verify installation:
+   ```bash
+   python -c "import tkinter; import pandas; import croniter; print('All dependencies installed successfully!')"
+   ```

8-17: Improve clarity in tool description and usage section.

The description needs minor grammatical improvements and could benefit from more detailed usage examples.

Apply these refinements:

 ## `find_free_timeslot.py`

-It's a tool to roughly visualize the busyness of update schedule to find a timeslot for some new update group.
+It's a tool to roughly visualize the busyness of the update schedule to find a timeslot for some new update group.

 ### Usage

-Just run `python find_free_timeslot.py` and use GUI to find less crowded timeslots. 
+Just run `python find_free_timeslot.py` and use the GUI to find less crowded timeslots. 
 You can regenerate durations config for more accurate representation.
-See below for details
+See below for details.
+
+Example usage:
+```bash
+python find_free_timeslot.py
+```
🧰 Tools
🪛 LanguageTool

[uncategorized] ~10-~10: Possible missing article found.
Context: ...ol to roughly visualize the busyness of update schedule to find a timeslot for some ne...

(AI_HYDRA_LEO_MISSING_AN)


[uncategorized] ~14-~14: You might be missing the article “the” here.
Context: ... python find_free_timeslot.py and use GUI to find less crowded timeslots. You ca...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)


24-31: Enhance the data extraction steps with more context.

The Grafana data extraction steps could be clearer with additional context and formatting.

Consider restructuring the steps like this:

-1. Get data fetch time statistics (e.g. from grafana) (example: `data.csv.example`).  In our case, you can:
+1. Get data fetch time statistics from Grafana (see `data.csv.example` for the expected format):
+   
+   Navigate to Grafana:
     - Open "Microservices > Stats" dashboard
     - Find "Average data fetch time"
-    - [Three dots] > [Inspect] > [Data]
-    - [Data options] > [Show data frame] > [Series joined by time]
-    - [Formatted data] = off
-    - [Download CSV]
+   
+   Export the data:
+    - Click [Three dots] > [Inspect] > [Data]
+    - Under [Data options], select [Show data frame] > [Series joined by time]
+    - Set [Formatted data] = off
+    - Click [Download CSV]

31-32: Add more details about script parameters.

The reference to --help could be more informative.

Add example command and common parameters:

-2. Run the script (preferably from this folder, to correctly use default parameters) (see `--help` for details)
-3. Enjoy newly generated `durations.json` after running `find_free_timeslot` script.
+2. Run the script from this folder to use default parameters:
+   ```bash
+   python generate_durations.py data.csv
+   # Or see available options:
+   python generate_durations.py --help
+   ```
+
+3. The script will generate `durations.json`, which will be automatically used by the `find_free_timeslot` script
+   for more accurate visualization.
stats/config/utils/free_timeslots/durations/durations.json (1)

36-39: Consider performance impact of frequent updates

The following metrics are updated every time unit:

"new_accounts_group": 102,
"new_contracts_group": 1,
"new_txns_group": 1,
"new_verified_contracts_group": 1,

While new_accounts_group has a longer interval (102 units), the other metrics update very frequently. This could impact performance if these queries are resource-intensive.

Consider:

  1. Evaluating if such frequent updates are necessary for these metrics
  2. Implementing caching mechanisms if not already in place
  3. Using incremental updates where possible instead of full recalculations
stats/config/utils/free_timeslots/find_free_timeslot.py (8)

10-16: Consider renaming enum for clarity.
DurationMenu is a fine approach, but naming it DurationSource might be more descriptive.

-class DurationMenu(Enum):
+class DurationSource(Enum):
     MANUAL = "manual"
     CONFIG = "config"

Line range hint 23-60: Encapsulate configuration loading and initialization.
The constructor does multiple actions (config load, default path checks, GUI setup). For clarity and maintainability, consider extracting config loading into a separate function or class responsible for reading from JSON.


60-148: Consider splitting GUI setup for readability.
setup_gui is quite large, creating multiple frames, plus the calendar, canvas, and schedule list. Splitting it into smaller helper functions (e.g., setup_top_controls(), setup_calendar(), etc.) can improve readability and maintainability.


181-181: Well-structured color generation.
The approach using HSV ensures a smooth color transition. If you want a different visual effect, you could consider alternative hues or color gradients.

Also applies to: 187-187, 189-189


190-215: Optional logging for empty schedules or parse errors.
The code gracefully returns an empty list when croniter fails, but consider logging or user feedback to surface scheduling errors.


218-246: Time complexity is acceptable, but watch for large durations.
get_task_overlaps loops through up to 1440 minutes per day. This is acceptable for moderate usage. If tasks can have extremely large durations or lists, consider early-limiting or rethinking the approach.


248-293: Refactor drawing logic for maintainability.
update_visualization does the timeline drawing, color-coding, and concurrency calculations in one place. Consider isolating the drawing code in a helper method to keep logic more concise and testable.


310-341: Hover event logic is clear and robust.
Using canvas tags to identify tasks is a smart solution. The code gracefully handles no matching tags. For large numbers of tasks, consider shaping the message or grouping.

stats/config/utils/free_timeslots/generate_durations.py (2)

1-7: Describe assumptions about CSV formatting for consistency.

Currently, the script assumes the CSV contains at least one row for durations and a "Time" column that should be skipped while reading. In scenarios where the CSV is empty or lacks expected columns, the script might fail. Consider adding checks or warnings for file validation to ensure robust behavior and prevent runtime errors.


153-200: Provide user-friendly error messages with possible fixes.

When an exception occurs, the user sees Error: <message>. Consider adding a hint or recommended steps, such as checking CSV format or Rust file content, to guide resolution of common issues.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 85cd2c9 and b898754.

📒 Files selected for processing (9)
  • stats/config/utils/README.md (0 hunks)
  • stats/config/utils/free_timeslots/README.md (1 hunks)
  • stats/config/utils/free_timeslots/durations/.gitignore (1 hunks)
  • stats/config/utils/free_timeslots/durations/data.csv.example (1 hunks)
  • stats/config/utils/free_timeslots/durations/durations.json (1 hunks)
  • stats/config/utils/free_timeslots/find_free_timeslot.py (3 hunks)
  • stats/config/utils/free_timeslots/generate_durations.py (1 hunks)
  • stats/config/utils/free_timeslots/requirements.txt (1 hunks)
  • stats/stats/src/update_groups.rs (1 hunks)
💤 Files with no reviewable changes (1)
  • stats/config/utils/README.md
✅ Files skipped from review due to trivial changes (3)
  • stats/config/utils/free_timeslots/durations/.gitignore
  • stats/config/utils/free_timeslots/durations/data.csv.example
  • stats/config/utils/free_timeslots/requirements.txt
🚧 Files skipped from review as they are similar to previous changes (1)
  • stats/stats/src/update_groups.rs
🧰 Additional context used
🪛 LanguageTool
stats/config/utils/free_timeslots/README.md

[uncategorized] ~10-~10: Possible missing article found.
Context: ...ol to roughly visualize the busyness of update schedule to find a timeslot for some ne...

(AI_HYDRA_LEO_MISSING_AN)


[uncategorized] ~14-~14: You might be missing the article “the” here.
Context: ... python find_free_timeslot.py and use GUI to find less crowded timeslots. You ca...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)

🔇 Additional comments (11)
stats/config/utils/free_timeslots/README.md (1)

1-32: Overall documentation structure is good!

The README provides clear instructions and covers the essential aspects of the tool. The suggested improvements above will enhance readability and user experience, but the current version is already functional and informative.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~10-~10: Possible missing article found.
Context: ...ol to roughly visualize the busyness of update schedule to find a timeslot for some ne...

(AI_HYDRA_LEO_MISSING_AN)


[uncategorized] ~14-~14: You might be missing the article “the” here.
Context: ... python find_free_timeslot.py and use GUI to find less crowded timeslots. You ca...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)

stats/config/utils/free_timeslots/durations/durations.json (2)

12-23: Verify the necessity of multiple recurrence periods

The configuration includes recurring account metrics for multiple periods (60, 90, 120 days) and frequencies (daily, weekly, monthly, yearly). This creates 12 different metrics that need to be maintained.

Let's check if all these metrics are actively used:

✅ Verification successful

Let me try another search strategy to find any code that might be using these metrics.


Multiple recurrence periods are intentionally configured but currently disabled

Based on the codebase analysis:

  • All three recurrence periods (60, 90, 120 days) are properly implemented with corresponding Rust types and test cases in active_recurring_accounts.rs
  • The metrics are configured in the layout (stats/config/layout.json) and have scheduled updates in update_groups.json
  • However, all three charts are explicitly disabled in charts.json with "enabled": false

This suggests that these metrics were intentionally implemented but are currently disabled, likely for future use or A/B testing. The multiple periods allow for different analysis windows of recurring account activity, which could be valuable for different business insights.

Locations that confirm this:

  • stats/config/charts.json: All three charts are defined but disabled
  • stats/config/layout.json: Charts are included in the "Accounts" section layout
  • stats/stats/src/charts/lines/active_recurring_accounts.rs: Full implementation with tests
  • stats/config/update_groups.json: Scheduled updates configured for all metrics
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check usage of recurring account metrics in the codebase
# Expected: Find references to these metrics in charts or UI components

# Search for metric usage in code
rg "recurring_accounts.*recurrence(60|90|120)_days" --type rust --type typescript --type javascript

# Search for metric names in chart configurations
fd -e json -e yaml -e toml | xargs rg "recurring_accounts.*recurrence(60|90|120)_days"

Length of output: 1741


Script:

#!/bin/bash
# Search for any files that might be processing these metrics
rg -l "recurring_accounts" --type-list
rg "recurring_accounts" -A 5 -B 5

# Look for any test files or specs that might be using these metrics
fd -e test.js -e spec.js -e test.ts -e spec.ts | xargs rg "recurring_accounts"

# Check if there are any SQL files that might be generating these metrics
fd -e sql | xargs rg "recurring_accounts"

Length of output: 24435


1-42: Document the time unit and explain update frequency variations

The configuration defines update frequencies for various blockchain metrics, but there are a few concerns:

  1. The time unit for these durations is not documented (seconds/minutes/hours?)
  2. There are unexplained variations in update frequencies:
    • Most metrics: 1 time unit
    • new_accounts_group: 102 time units
    • txns_success_rate_group: 2 time units
    • completed_txns_group: 19 time units

Let's verify if these frequencies match the original configuration:

✅ Verification successful

Time units are in minutes, update frequencies are intentionally varied

Based on the Python script that generates the durations.json file, I can clarify:

  1. The time units in durations.json are in minutes (converted from seconds in the source data)
  2. The varying update frequencies are intentionally calculated based on the actual duration of operations for each metric:
    • The script reads raw durations from a CSV file
    • Converts them from seconds to minutes
    • Ensures a minimum of 1 minute for each group
    • Groups with multiple charts sum up their individual durations

The seemingly inconsistent values like 102, 19, and 2 minutes are not arbitrary but derived from the actual processing times of their respective operations.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check the original configuration file mentioned in PR description
# Expected: Find similar update frequencies in the original config

curl -s "https://raw.githubusercontent.com/blockscout/blockscout/367dd7e48dc93eb556670659783c70cbb720fc51/apps/explorer/config/config.exs" | grep -A 10 "stats_period"

Length of output: 164


Script:

#!/bin/bash
# Let's search for any documentation or references about these time units in the codebase
rg -i "time.*unit|duration.*unit|update.*frequenc|stats.*period" --type md --type txt

# Also search for any code that uses these durations
ast-grep --pattern 'durations.json'

# Search for any configuration or documentation files related to stats
fd -e json -e md -e txt -e yaml -e yml stats

Length of output: 806


Script:

#!/bin/bash
# Let's check the stats service implementation for more details about the time units
rg -A 5 -B 5 "duration.*group|stats.*period" stats/

# Also check for any scheduler or timer related code
ast-grep --pattern 'time.Duration'

# Look for any configuration loading code
ast-grep --pattern 'durations.*json' -l | xargs cat

Length of output: 66125


Script:

#!/bin/bash
# Let's check the Python script that generates these durations
cat stats/config/utils/free_timeslots/generate_durations.py

# Also check for any config files that might document the time units
fd -e yaml -e toml -e conf -e ini -e env stats/config/

Length of output: 7041

stats/config/utils/free_timeslots/find_free_timeslot.py (4)

158-160: Validate the cron field count.
The conversion from 7-field to 5-field cron looks fine, but if the string has fewer than 7 fields, it falls through. Consider logging a warning or returning early for nonconforming cron strings.


163-176: Robust handling of JSON loading.
The load_json function uses a try-except block to prevent crashes and informs the user of issues. This is a best practice for user-facing file loads.


299-301: Efficient update of schedule list.
Rebuilding the TreeView ensures a clean refresh. Implementation here looks solid and aligns well with the rest of the code.


304-307: Simple, clear date selection handling.
Switching both visualization and schedule listing upon date selection is a straightforward approach.

stats/config/utils/free_timeslots/generate_durations.py (4)

9-13: Ensure underscore handling for edge cases in convert_camel_to_snake.

The regex-based approach should be effective for typical camelCase patterns. However, consider testing/handling scenarios with multiple uppercase letters in a row (e.g., "HTTPServer") or leading capitals (e.g., "AppSettings").


15-19: Confirm period normalization for a wider set of intervals.

Currently, only four periods are mapped (WEEK, MONTH, YEAR, DAY). If the code might ingest any new or unexpected period strings (e.g., "HOUR" or "QUARTER"), it will return the unchanged string. Make sure that's the intended fallback logic.


21-55: Handle invalid or commented-out lines within group definitions.

The regex-based approach may accidentally capture lines that are commented out or have special characters. Consider a more robust parsing approach if the Rust file's structure evolves (e.g., adding macros or new group definitions).


202-203: Maintain consistent entry point usage.

Using if __name__ == "__main__": is standard for Python CLI tools. Ensure internal or external documentation references python generate_durations.py ... usage if needed, so that end-users know how to invoke it properly.

@bragov4ik bragov4ik requested a review from sevenzing January 2, 2025 11:58
@bragov4ik bragov4ik merged commit 1c70b84 into main Jan 7, 2025
6 checks passed
@bragov4ik bragov4ik deleted the bragov4ik/stats-migrate-backend-charts-7 branch January 7, 2025 12:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants